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ABSTRACT 

Many modern applications require the evaluation of analyt¬ 
ical queries on large amounts of data. Such queries entail 
joins and heavy aggregations that often include user-defined 
functions (UDFs). The most efficient way to process these 
specific type of queries is using tree execution plans. In this 
work, we develop an engine for analytical query processing 
and a suite of specialized techniques that collectively take 
advantage of the tree form of such plans. The engine exe¬ 
cutes these tree plans in an elastic laaS cloud infrastructure 
and dynamically adapts by allocating and releasing perti¬ 
nent resources based on the query workload monitored over 
a sliding time window. The engine offers its services for a fee 
according to service-level agreements (SLAs) associated with 
the incoming queries; its management of cloud resources 
aims at maximizing the profit after removing the costs of 
using these resources. We have fully implemented our al¬ 
gorithms in the Exareme dataflow processing system. We 
present an extensive evaluation that demonstrates that our 
approach is very efficient (exhibiting fast response times), 
elastic (successfully adjusting the cloud resources it uses as 
the engine continually adapts to query workload changes), 
and prohtable (approximating very well the maximum differ¬ 
ence between SLA-based income and cloud-based expenses). 


1. INTRODUCTION 

Many modern applications face the need to process vo¬ 
luminous data using ad-hoc analytical queries 31 3^ . 
They also call for the use of complex user-defined functions 
( UDFs) that do not come from a pre-defined set of operators 
with well known semantics for which SQL proper is often not 
sufficient or efficient to use. Furthermore, these queries must 
demonstrate very fast and near-interactive response times 
[^[^. It has been shown that, in appropriate computational 
environments such as shared-nothing, specific tree execution 
plans, can answer queries of the above kind on trillions of 
objects in seconds [^[^. Figure shows a generic image 
of such a tree execution plan: the leaves of the tree repre¬ 
sent the data that are partitioned appropriately based on 
the application. The remaining nodes represent operators 
(e.g., such as group bys) and the connections between them 
correspond to operator dependencies. The operators at the 
first level (Lq) typically perform joins and filtering. The 
internal operators (levels Li to Ln- 2 ) perform partial ag¬ 
gregations. Finally, the root operator (level Ln-i) performs 
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global aggregations and produces the final result. 

Several systems have been proposed for large-scale data 
processing ^7\ [^ ; they are typically built on top 

of laaS clouds |7[ |18] which have emerged as an attractive 
platform for analytical query processing. The defining char¬ 
acteristic that favors laaS clouds over other competing en¬ 
vironments (such as distributed, cluster-based, grid, etc.) is 
elasticity^ i.e., the ability to lease compute and storage re¬ 
sources on-demand and use them only for as long as needed. 
This makes possible to create an elastic virtual infrastruc¬ 
ture that may change over time. laaS clouds offer compute 
resources in the form of virtual machines (VMs). The cost 
of leasing a VM is determined based on a per time-quantum 
pricing scheme, where one pays for the entire quantum inde¬ 
pendently of the extent of the use of the VM resources |^. 
An elastic cloud-enabled engine may allocate or de-allocate 
VMs dynamically, trying to identify the optimal trade-off 
between the need to minimize execution times for a given 
workload and the requirement to minimize the monetary 
cost of using the cloud resources 

In this work, we develop an elastic processing engine op¬ 
erating atop an laaS infrastructure that is capable of exe¬ 
cuting efficiently and cost-effectively a large class of analyt¬ 
ical queries demonstrating a tree execution plan of a spe¬ 
cific form. We have implemented the functionality within 
Exareme [21[ |39] , our system for dataflow execution on the 
cloud. Figure 1^ depicts the salient characteristics of our 
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Figure 2: Engine for Elastic Analytical Query Pro¬ 
cessing. 


engine: arbitrarily complex queries, possibly having UDFs 
with arbitrary user-code, are continually submitted to the 
engine. Each query is associated with an SLA that des¬ 
ignates the price that a query instigator must pay for an¬ 
swering the query depending on its response time (faster 
response times are associated with higher prices). The data 
is originally stored on the cloud (e.g., Amazon S3|^) and is 
partitioned to increase flexibility and performance. 

In this context, our proposed engine and its requisite 
mechanisms make the following contributions: 

• We introduce an online algorithm that exploits the 
elasticity of laaS clouds to adapt the size of the vir¬ 
tual infrastructure to the query workload at hand by 
dynamically allocating or de-allocating VMs. This is 
done so that our engine maximizes its proht while tak¬ 
ing into account the monetary cost of expended cloud 
resources as well as the SLAs of the submitted queries. 

• We propose to lay out the VMs allocated in a “tree” 
shape (Figure]^, so that query execution plans are 
mapped naturally to laaS processing elements. The 
VMs at the leaf-level fetch data from the cloud storage 
and cache it to their local (virtual) disk for processing, 
thereby decoupling compute and storage resources. 
For partition assignments, we use an extension of con¬ 
sistent hashing and devise a simple, yet quite accurate, 
analytical formula to approximate the cost of partition 
reassignment; we use this formula when our online al¬ 
gorithm searches for an optimal choice when consider¬ 
ing changes in the deployment of resources at the data 
level Lo (as shown in Figure]^. 

• We have implemented our approach within Exareme 
and have performed an extensive experimental evalua¬ 
tion which indicate significant and very promising re¬ 
sults. Our method compares favorably to Cloudera Im- 
pala on sheer performance offering near-interactive 
response times, it adapts quickly to workload changes. 


and it increases the processing engine profit signifi¬ 
cantly compared to static infrastructures. 


The rest of this paper is organized as follows: Section 
offers motivating query examples from two key classes of 
contemporary query processing and Section discusses the 
operating environment. Section ^ outlines the intuition for 
our suggested solution and Section presents the proposed 
query engine. Section [^furnishes our key experimental find¬ 
ings while related work and conclusion are found in Sec¬ 
tions and respectively. 


2. MOTIVATION - TREE QUERIES 

We draw our motivation from key classes of analytic 
queries frequently encountered in data warehouses and 
iVoSQL-systems. 

i) Data Warehouses store historical data used to help under¬ 
stand market trends and create management reports [3^ . 
Typical queries perform joins and extensive aggregations, 
and usually return only heavy hitters (the top records as 
ordered on some columns) [^. The following query shows 
such an example in SQL that is inspired by the TPC-Jd bench¬ 
mark [^: 

SELECT year, country, 

sum(l_extendedprice) as revenue, 

SUMMARY(l_extendedprice) as report 
FROM lineitem, supplier, nation 
WHERE l_suppkey = s_suppkey 
AND s_nationkey = n_nationkey 
GROUP BY year, country ORDER BY year, country; 


The query joins three tables, groups the results by each 
country and year, and computes the revenue for each group. 
It also uses the SUMMARY UDF to generate a report on the 
overall output. 

The typical schema of a data warehouse is a star or a 
snowflake and is heavily denormalized for performance. 
The fact table lineitem in the example, is very large com¬ 
pared to other two tables. To expedite processing, the data 
placement here has the fact table partitioned horizontally 
and the other tables replicated at all locations where parti¬ 
tions exist. Thus, all query joins are local to each machine 
and the aggregations can be executed as a tree. 
a) NoSQL-systems provide techniques to store and process 
data that is typically in the form of key-value pairs, graphs, 
documents 26 


30 . Typical queries involve filtering 


and transformations on a single input table while joins are 
usually avoided as they are often expensive; required joins 
can be realized atop such systems . The following exam¬ 
ple dataflow shows how a simple intrusion detection analysis 
on server logs could be expressed in Flume Java [^: 


PCollection<String> in = Readinput("log.txt"); 
// Parse and convert to log entry objects 
PCollection<KV<lP, LogEntry>> entries = 
in.parallelDoCnew LogTransform()); 
PTable<lP, Collection<LogEntry>> g = 
entries .groupByKeyO ; 

// Perform analysis on each group g 
PTable<lP, Report> result = 

g.combineValues(new IntrusionAnalysis()); 
FlumeJava.run0; 


The dataflow reads the input from hie log.txt (one row 
per line) and converts it to key-value pairs using the 
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LogTransform UDF with the respective IP as key. It 
then groups entries by IP and performs an intrusion detec¬ 
tion analysis on each group using the IntrusionAnalysis 
UDF. The usual data placement has the files partitioned in 
blocks of fixed size and distributed to different VMs, typi¬ 
cally using a distributed file system [^. In this example, 
LogTransf orm is executed in parallel on all blocks of the file 
and IntrusionAnalysis is executed again in parallel on the 
formed groups. 

The solutions proposed in the past for the above categories 
of queries are not sufficient for cloud environments for they 
a) treat all resources indistinguishably with no attention to 
the nature of the queries, b) are not elastic, and/or c) they 
target performance by evaluating queries as fast as possible, 
treating the monetary cost as a secondary consideration or 
ignoring it completely. Our work comes to fill this gap. 

3. PROBLEM FORMULATION 

We present in details key aspects of the problem we ad¬ 
dress together with the relevant notation and definitions. 

3.1 laaS Cloud 

A container or VM is the unit of cloud compute re¬ 
sources and includes CPU(s), memory, disk(s), and net¬ 
work resources. All containers furnished for general use have 
the same size, i.e., the same capacity in every type of re¬ 
source they provide, e.g., equal memory size. By and large, 
this is typical of most clouds where only a limited number of 
VMs has substantially enhanced resources to help them run 
core services (e.g., namenodes for Hadoop [^). The price 
Mq for using a container is a fixed amount in $ per time 
quantum Tq . The set of containers allocated to a cloud ap¬ 
plication, such as our query processing engine, constitutes 
the virtual infrastructure of the application. The cloud 
also offers data storage resources, which are decoupled 
from its compute resources for flexibility. VMs transfer data 
from these storage resources and cache it to their local vir¬ 
tual disks for processing. 

3.2 Data Partitioning 

Tables are partitioned and replicated so that joins (if 
any) are local to containers and only aggregations require 
data transfer. Hence, partitioning is based on foreign keys 
used in joins. If the database has only one table (the usual 
case in iVoS'QL-systems), it is partitioned randomly into 
shards of equal size. If the database has multiple tables as it 
happens in data warehouses, the largest tables (one or more, 
depending on the available storage) are partitioned and all 
others are replicated wherever the partitions are stored. In 
this regard, in the TPC-H benchmark, it may be most bene¬ 
ficial to partition the two largest tables lineitem and orders 
with hash partitioning on l_orderkey, which is a foreign 
key in table orders, and replicate the other tables. This is 
precisely the partitioning scheme we use for TPC-H in our 
experiments. 

3.3 Properties of Analytical Queries 

Issued SQL queries may include filters, joins, and two 
types of group aggregate functions: distributive and al¬ 
gebraic [^. Distributive functions are directly paral- 
lelizable, as they are commutative, associative, and for a 
table T with two partitions Ti, T 2 , satisfy the property 
/(T) = /(/(Ti) U /(T 2 )). Examples of such functions from 


SQL include min, max, and sum . Algebraic functions are indi¬ 
rectly parallelizable, as they can be expressed as algebraic 
combinations of distributive or other algebraic functions. 
Examples from SQL include count, avg, stdev, all expressed 
as increasingly more complex combinations of count and 
sum. More importantly, the queries we support may also 
include UDFs with arbitrary code that may correspond to 
distributive or algebraic functions. A f/DF-example is the 
function of reservoir sampling which randomly selects 
a subset of a table’s records with equal probability. 

Using the above properties, we may readily transform flat 
queries into tree plans by recursively unwrapping all alge¬ 
braic functions until only distributive functions are left. Eor 
example, consider two tables R(A,B,...) and S(B,...), both 
partitioned on column B, and the following flat query: 

select avg(A) as AA from R, S where R.B = S.B 

We transform the above SQL-statement into a tree-based 
one using the following four “conceptual queries”: leaf, 
internal-initial, internal-recursive, and root. The particulars 
of each query are as follows: 

• Leaf: carrying out filtering and joins 

select A from R, S where R.B = S.B; 

• Internal-initial: executing the distributive aggregate 
initialization 

select sum(A) as SA, count (*) as CA from leaf; 

• Internal-recursive: producing partial distributive ag¬ 
gregation (s) 

select sum(SA) as SA, sum(CA) as CA 
from internal-initial; 

• Root: compiling sought algebraic aggregation(s) 

select sum(SA) / sum(CA) as AA 
from internal-recursive; 

The above conceptual queries have to be placed on the 
morphed query execution tree (e.g., Eigure [^. The leaf 
queries are placed at level 0 of the execution tree in order 
to be executed in parallel on each partition. Since internal- 
initial also functions on each partition independently, this 
type of query can be part of level 0. Between level 1 of the 
tree (e.g., Eigureand its root, we place internal-recursive 
queries. Given the commutativity and associativity of dis¬ 
tributive functions, there may be an arbitrary number of 
levels of internal-recursive queries, without affecting correct¬ 
ness. The actual number of the internal level of the resulting 
query-tree depends on the size of the original tables and the 
affordable degree of parallelization. Einally, note that, for a 
query without algebraic functions, the root query is identical 
to the internal-recursive query. 

3.4 Service Level Agreement 

An SLA is a function having query execution time as in¬ 
put and money as output, namely, SLA : ^ R, both 

in appropriate units, often in seconds and dollars respec¬ 
tively. SLAs can be step-wise or more sophisticated . 

Inspired by other works, we use a generic form of SLAs de¬ 
fined as follows: SLA{u, q,t) — a - , where a and 7 are 

respectively regulators of the maximum amount of money a 
user pays and the monetary cost reduction rate with time. A 
small 7 indicates a critical query that should be rapidly ex¬ 
ecuted as its value drops drastically. Alternatively, a large 7 
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Figure 3: Two SLAs: ‘critical’ and ‘best-effort’. 

indicates a best-effort query. We avoid using a step function 
because smoothness plays an important role in our optimiza¬ 
tion problem as we describe in Section 

An example of two different SLAs is shown in Figure 
The critical SLA has a = 100 & 7 = 40 and the best-effort 
has o = 20 & 7 = 500. Notice that the critical SLA is 
very profitable for low execution times but its price drops 
quickly. The dehnition of SLAs can be extended to include 
negative values: a penalty that the service provider pays if 
the execution time is large. We leave the exploration of this 
alternative as future work. 

3.5 Profit Maximization Problem 

The queries are issued to the engine in a streaming fash¬ 
ion. Each query is associated with its own SLA. The price 
of the query charged is computed using both its SLA and 
its execution time. The revenue generated by the engine 
during a particular time period p is computed as the sum¬ 
mation of the prices all queries launched during the period 
in question. The operational cost in p using c containers 
is computed as: O — c ■ p/Mq. The profit P during the 
same period is computed as P = R — O. Our optimization 
objective is to maximize the provider’s profit during the op¬ 
eration of the engine, i.e., maximize the difference between 
operational cost and revenue. 

Figure [^illustrates our optimization goal; it shows a typi¬ 
cal revenue curve per time quantum as affected by the num¬ 
ber of containers [^. The y-axis indicates the rate with 
which the revenue is generated. The figure also shows the 
operational cost of the engine per time quantum, which is 
linear to the number of containers allocated as the incurred 
expense for every VM by the provider is the same. Our 
goal is to identify a point M, that is the optimal number 
of containers that help maximize proht, i.e., the difference 
between revenue and operational cost is maximized. No¬ 
tice that that the revenue function is a “moving target” as 
it highly depends on the query workload and so, M does 
change over time. The engine should be able to dynami¬ 
cally adapt to workload changes and find the optimal point 
of operation at any moment. 

4. ILLUSTRATIVE EXAMPLE 

In this section, we present an example to give a high level 
overview of our approach. Figure [^a) depicts two queries 
that are issued concurrently to the engine. Each query is 
transformed into a tree execution plan with its data at the 
leaves of the tree and respective operators at the internal 
nodes. For simplicity, assume that the execution time of 
each operator is 1 second and that it generates some amount 
of data that is negligible. Further, assume that the SLAs 



Figure 5: Profit maximization based on revenue and 
operational cost. 

for both queries are identical and dehned as: price{t) = 
15 • where t is the query execution time measured in 

seconds and the price is measured in $. 

The engine has allocated several VMs from the cloud and 
the data is appropriately partitioned. We lay out the de¬ 
ployed VMs in a “tree” shape, to naturally map the execu¬ 
tion plans of both discussed queries onto the allotted virtual 
infrastructure. This tree-shaped use of resources may lead 
to diverse deployments as Figure [^b) illustrates; here, we 
depict three different execution VM layouts that help ma¬ 
terialize Q1 and Q2. More specihcally, layout (ii) of Fig¬ 
ure ^b) works with 9 VMs of which 4 are at the data level 
(Loyr2 VMs at each intermediate levels (L 1 -L 2 ), and 1 VM 
at the root (L 3 ). 

The three different layouts of Figure ^ render different 
processing times when concurrently executing Q1 and Q2. 
For example in layout (ii), Q1 and Q2 complete at 9th and 
10th seconds respectively. The turnaround times are com¬ 
puted by summing the delay each query faces at each level 
of the layout. Given that we have 16 operators at Lq and 
each one runs for 1 second, the total delay using 4 VMs is 4 
seconds. At Li, we have 6 operators (2 from Q1 and 4 from 
Q2) that yield a delay of 3 seconds since 2 VMs are used. 
Similarly at levels L 2 and L 3 the respective delays stand at 
2 and 1 second. 



layout A 

layout B 

layout C 

Q1 Time (sec) 

13 

9 

6 

Q1 Price ($) 

7.83 

9.57 

11.11 

Q2 Time (sec) 

14 

10 

7 

Q2 Price ($) 

7.44 

9.09 

10.57 

Revenue ($) 

15.27 

18.66 

21.68 

VM Cost ($) 

7.00 

9.00 

15.00 

Profit ($) 

8.27 

9.66 

6.68 


Table 1: Profit for the Different Three Layouts. 


Assuming that the costs of each VM is $1 for simplicity, 
using the above execution times for each level, the price for¬ 
mula computes the charged price for each query. Table ^ 
shows both revenue (i.e., sum of prices) and profit made on 
the provided service. The latter is computed as the differ¬ 
ence revenue-cost and yields layout (ii) as the best of the 
three choices in Figure [^b). 

If we are to automate the above procedure, we need to ar- 
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Figure 4: a) Execution plans for Q1/Q2, b) Mapping of the 2 plans on 3 different VM layouts, and c) Profit as 
a function of the number of VMs lo and h. 


ticulate execution times in diversified layouts given a num¬ 
ber of VMs li at each level Li. Provided that the number 
of operators drops exponentially from the leaves to the root 
of the query tree, the two lower levels Lq — Li have the 
greatest impact as far as the turnaround time of queries is 
concerned. Should we assume that the VM numbers I 2 and 
I 3 do not change, the execution times for Q1 and Q2 required 
for these two levels are 2 and 3 seconds respectively. The 
potential proht generated by levels Lq — Li when different 
numbers of VMs are deployed to materialize the two queries 
is as follows: 

profit{lo, h) = price{tQi) + price{tQ2) - cost{lQ + h) = 

15 g(-*Ol/ 20 ) j 5 g(-t< 52 / 20 ) _ ^ 

]^^g(“(^(^05b)+3)/20) — 

where t(/o,T) = I6//0 + 6//1 is the time required for the 
concurrent execution of the two queries at levels Lq — Li; 
here, 16 is the total time needed by all operators at Lq car¬ 
ried out by Iq VMs (assuming perfect load-balancing) and 
6 is the total time required by operators at Li which are 
ultimately carried out by h VMs. Figure [^c) plots the con¬ 
tour of the expected profitQ as a function of Iq and h; the 
proht increases as we move from darker to lighter color. For 
this specihc version of the layout problem, the contour plot 
points out that the optimal solution is around 4 VMs at Lq 
and 3 VMs at Li. 

We have to generalize the solution for layout selection dis¬ 
cussed above when we consider multiple parameters includ¬ 
ing the li numbers of VMs allotted to every level, number of 
queries considered together, potential data re-organization, 
SLAs as well as timing aspects of the engine’s operation. In 
doing so, the following challenges arise: A) how to hnd the 
optimal number of VMs given a query workload, B) how to 
schedule the query execution trees on the available VMs, 
O) how to dynamically change the layout and adapt to 
changes in the workload, and D) how to partition the data 
in order to add or remove VMs without signihcant network 
overhead. We address each of these challenges in the follow¬ 
ing sections. 

5 . OVERALL APPROACH 

In this section, we present the overall approach we use to 
maximize profit. Time is separated into windows of fixed 
length (e.g., epochs of 300 seconds) and inside each win¬ 
dow we do not adjust the virtual infrastructure. All queries 
issued within a window, are scheduled assuming a hxed con¬ 


tainer layout. In the beginning of each window, we compute 
the new layout based on the measurements collected from 
the queries in a number of previous time windows while tak¬ 
ing into account data re-conhguration cost. In this section, 
we discuss the data partitioning scheme we employ, the elas¬ 
tic container layout, our online elastic layout allocation ap¬ 
proach, and the query scheduler we use. 

5.1 Container Layout 

A container layout is a hierarchical overlay on top of the 
allocated containers that defines the allowed communication 
channels between them. Figureshows this generic layout. 
Each level has a fixed number of initial containers (shown 
in red in the figure) and is elastic, i.e., can change in size 
by allocating or deleting containers while enforcing optional 
minimum/maximum thresholds. The table partitions are 
located at the lowest level of the layout. Each VM found at 
internal level Li can communicate only with the levels above 
(Li+i) and below (L^-i). Trees with height of 4 or more are 
rarely needed in practice and only appear in very large data 
centers [^. Eor this reason, we use 3 levels in our setting, 
however this is conhgurable. 

5.2 Data Partitioning and Placement 

Our method is based on consistent hashing (CH) as its 

present good theoretical bounds on the size of data required 
to move when containers are added or deleted. Table parti¬ 
tions are placed in a logical circle as shown in the inner-circle 
of Eigure[^a). The outer circle consists of the deployed con¬ 
tainers at Lq with each one assigned one or more partitions. 
Eor example, partition 3 is assigned to container 2. Notice 
that we place each partition multiple times in the inner cir¬ 
cle. The first time a partition is accessed from the cloud 
storage is cached for subsequent usage. When a new con¬ 
tainer is added, it is placed in the outer circle at a position 
next to the container having the largest number of parti¬ 
tions; the latter sheds half of its data partitions to the new 
arrival. Eor example, when new container #6 is added, is 
placed next to #5; Containers #5 and ^6 then split the 
existing partitions as shown Eigure[^b). 

To increase parallelism and flexibility we use over¬ 
partitioning and replication. We partition the tables into 
many more parts than the number of maximum data con¬ 
tainers predicted to use (e.g., 10 times more). Thus, chang¬ 
ing the number of containers will cause only data transfers 
between the cloud storage and VMs, yet, it does not call for 
extensive re-partitioning (e.g., using hashing) on the cloud 
storage; this last operation is in general very expensive and 
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Figure 6: Partitioning and Placement of Data using 
consistent hashing. 


(o) 



Initial Containers 



Figure 7: Percentage of partitions assigned to a dif¬ 
ferent container when changing their number (a) 
and the modeling of data movement (b). 


incurs high network traffic [^. Furthermore, we employ 
replication by adding each partition multiple times to the 
inner circle of Figure in adjacent positions. Thus, when 
high parallelism is needed, the same partition will be as¬ 
signed to multiple containers. Here, we balance the load 
between the containers that are assigned replicas of a par¬ 
tition. If more than one replicas happen to be assigned to 
the same container, we keep only one copy. 

Load balance is crucial in our setting since the execution 
time of the operators at each level of the layout is bounded 
by the operator with the maximum execution time. Par¬ 
titioning skew will delay the execution of all queries and 
affect revenue. For this reason, we extend the baseline-(7i/ 
to make it “more aggressive” when adding or removing con¬ 
tainers as follows: instead of splitting partitions between two 
containers, we perform a local balancing around the inser¬ 
tion point and split partitions among the nodes in vicinity 
of ffrc+1 containers. This way, the re-organization is still 
local in the circle but the partitioning is more balanced. In 
practice, we use a window of size ffrc/z=4. 

Figure]^ a) presents the outcome of an experiment using 
CH with 128 partitions and replication degree 3 whose goal 
is to demonstrate the robustness of the method. The x and 
y axes show the initial and final number of containers (i.e., 
going from x lo y containers). If x<y, then new containers 
are allocated, otherwise are deleted. We observe that when 
the changes are near the diagonal of the 2D-space, CH is 
robust to changes as the percentage of partitions requiring 
for re-assignment remains low (<10%). This characteristic 
makes CH ideal as a partition placement policy for our elas¬ 
tic processing engine. 

We need to model the above behavior of CH to use it in 
our optimization process and thus, take into account data 
re-organization when adjusting the size of the deployed vir¬ 
tual infrastructure. Figure M shows a ID cut of the two 
dimensional plot of Figure iHa) at 125 containers which 



Data Move 
— Linear Fit 


Figure 8: A 1-D cut of Figure [^a) at 125 containers. 


reveals a strong linear correlation between the number of 
containers and the percentage of partitions moved. Fig¬ 
ure I3b) provides the sought model that predicts the data 
needed to be transferred when the number of VMs changes. 
Let X and y be the previous and new number of contain¬ 
ers. The size of data that have to move is modeled as: 
sized{x,y) = {l — min{xly,ylx))-data-size^ with datasize 
is the total volume of the tables taking into account par¬ 
titioning and replication. Factor min{xly,ylx) is used to 
remove the symmetry of the 2D-space on the diagonal for 
sized{x,y) = sized{y,x). We measured the modeling error 
by computing the difference between the actual number of 
partitions moved (as shown in Figure]^ a)) and the predic¬ 
tions of our model and found that the estimation error to 
be on the average 6.4%, which is deemed very robust. 

5.3 Elastic Layout Allocation 

Our suggested algorithm for Elastic Layout Allocation 
helps dynamically change the container layout based on the 
query workload received to maximize profit. The proposed 
online algorithm works as follows: it uses the queries issued 
on a historical window Wh, their CPU load, and the data 
the queries transferred through the network. Using these 
statistics, the algorithm makes predictions for a window of 
size Wp in the future [^[^. We model the proht as a mul¬ 
tivariable function, representing each level of the container 
layout with a variable that indicates the number of contain¬ 
ers allocated (k). The goal is to hnd the optimal number of 
containers in each level that maximize profit in the predic¬ 
tion window. In our experiments, we use a historical window 
of 2 epochs (i.e., 600 seconds) to make predictions for the 
upcoming window of 300 seconds. Notice that a large Wh 
will cause the engine to adapt slowly to the workload and 
low Wh may cause it to change rapidly: both extremes are 
not ideal. We experimentally ascertained that these window 
sizes behave well and leave for future work the automated 
learning of these numbers. Next we formally define our op¬ 
timization function. 

The queries are separated into a finite number of classes 
each ha ving its own SLA which is the usual case in prac¬ 
tice [^. We denote as and the vectors carrying the 
respective values for all SLAs. Let Qh be the vector with 
the number of queries per SLA that have been executed dur¬ 
ing the historical window Wh- The total number of queries 
is numQn = We denote as Lh the current 

number of containers allocated at each level of the layout. 
Similarly, CPUh is the vector with the sum of CPU loads 
at every level of the layout within the historical window and 
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NETh is the total amount of data transferred outwards ev¬ 
ery level. Furthermore, we designate cone to be the average 
number of queries running concurrently at any point in time. 
We compute cone by summing the execution times of all 
queries within the historical window and divide this number 
by the length of this window. All the concurrently running 
queries share the same resources, and thus, they implicitly 
affect each other. 

Dealing with the prediction window Wp, we denote as 
Lp the container topology computed. Using the historical 
measurements and Lp, we can predict the average running 
time of the queries in the prediction window as follows: 


cone rCPUif [1] 

numQn L Lt[l] 


'U' ^ 

1^2 ^ Uw 


NETH[i\ 


net-speed • min{Lp[i — l],Lp[i]) 


j] 


where CPUH[i]/Lp[i] is the CPU load per container at level 
i of the layout. The factor XjnumQn above calculates the 
average time expended per query and we have to multiply by 
cone in order model the delay that each query poses on oth¬ 
ers running concurrently. At this point, our model assumes 
that it can achieve perfect load-balance at every level of the 
layout. The rationale behind this is that we have many op¬ 
erators at each level, and each of them is not expensive to 
execute. Given that, we can solve the relaxed problem and 
round the solution to integer values. The total network time 
of each container at level i is computed as: 

NETh H/ {net-speed • min(Lp[i — l],Lp[i]) 


since the maximum network throughput between two con¬ 
secutive level i-1 and i is determined by the minimum num¬ 
ber of containers in these two levels. 

We separate the prediction window into two parts: the 
first involving re-organization along with query execution 
(denoted as t%) and the second involving query execution 
only. The length of the first period is estimated by the time 
needed to perform data re-organization using the model of 
sized{x,y) defined above as follows: 

^ 5jged(L^[l],U[l]) 

[1] — Lp[l] \ • Arc • ncEspeed 


where (|Lif [1] — Lp[l]| - Arc) is the number of containers Arc 
in the circle affected by the change. These containers will 
transfer table partitions from the cloud storage through the 
network with net_speed being the network speed. Thus, the 
length of the second period exclusively dedicated to query 
processing is Wp — t%. Notice that the faster the time to re¬ 
organize the data is, the longer the period of time spent to 
execute queries becomes. This is the reason why our method 
prefers to perform changes to the number of data containers 
that are near the diagonal as shown in Figure]^ a). 

Our modeling could potentially include in the re¬ 
organization part the time to create a VM and initialize it. 
A simple approach would be to consider this time a constant 
(e.g., 1 minute). However, in most clouds, this is relatively 
small compared to the actual time that the VMs are used 
and some cloud providers allow for pre-configured instance^ 

^Okeanos: okeanos.grnet.gr. 


which can be created in seconds, making the initialization 
time negligible. Most importantly however, changing the 
shape of the virtual infrastructure does not directly imply 
the allocation of new VMs. In our implementation, con¬ 
tainers scheduled to be deleted, are kept until their entire 
quantum has finished. If the virtual infrastructure needs to 
grow in size, we opportunistically re-use any available con¬ 
tainers from those scheduled to be deleted, and essentially 
eliminate their initialization cost. 

We compute the estimated number of queries per SLA in 
each of the two parts of the prediction window as follows: 

^ ^ . Cp/Wh 

^ ^ . (H/p - t%)/WH 

Using the estimated number of queries, the predicted rev¬ 
enue per SLA class for the two part of the prediction period 
is as follows: 

Notice that we include the time to perform data re¬ 
organization tp in the calculation of the revenue in the first 

period (B^) of the prediction window. The total revenue in 
the prediction window is as follows: 

i i 

The operational cost is computed by adding the time 
quanta Tq of the allocated containers in the prediction win¬ 
dow Wp and multiplying by the quantum cost Mq as: 

0 = M§.^^(L^W) 

The profit generated is computed as R—O. We seek to hnd 
Lp that maximizes profit.Since the number of container lay¬ 
outs is limited assuming a maximum number of containers 
per level (e.g., 100), we could potentially compute the rev¬ 
enue enumerating all different layouts. The total number of 
layouts with height 4 and a maximum of 100 containers/level 
is 10®. In practice, this number is infeasible to compute ex¬ 
haustively. Instead, we maximize the profit function using 
the L-BFGS-B Algorithm which is a general purpose iter¬ 
ative optimization method that hnds local maxima/minima 
of multivariable functions. Since the L-BFGS-B finds so¬ 
lutions with real numbers, we round the solutions to the 
ceiling (e.g., a value of 13.4 becomes 14 containers). 

We seed L-BFGS-B with the previous layout {Lp) as the 
starting point. Extensive experimentation through enumer¬ 
ation of all solutions and comparison of outcomes to those 
derived with the help of L-BFGS-B showed that solutions 
are very close (yet, they are not identical due mostly to 
rounding). This was expected as changes is the topology 
are mostly gradual because of the data re-organization cost. 
The seeding the L-BFGS-B with the previous container lay¬ 
out (Lh) is sufficient to adequately guide the algorithm. 

eCloudManager: www.fluidops.com 
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5.4 Query Tree Scheduler 

The execution tree plan is scheduled by performing load 
balance on every level of the layout while considering current 
load at each container. The load is quantified as the number 
of running and queued operators. First, we find the rank 
of each operator that is the height of the node in the execu¬ 
tion tree (Figure[^. The rank of an operator determines the 
level of the layout at which is scheduled. As there is at least 
one container allocated in each level, we can always find at 
least one valid schedule. Once we determine the levels in 
which all operators are placed, we order containers at each 
level according to their load. The scheduler maps the oper¬ 
ators of the each level of the query tree to the corresponding 
containers using the increasing ordering in a round robin 
fashion. For generic dataflow graphs, the scheduling prob¬ 
lem is a much harder and more advanced methods should be 
used [^. However, in this work we consider only tree-query 
plans. The specialized scheduling algorithm discussed here 
works because of the following two reasons: i) individual 
operators are not expensive to execute and they do not gen¬ 
erate voluminous data as they use aggregate functions. This 
has as a consequence that even sub-optimal assignments of 
operators will not cause much imbalance, ii) operators that 
are at the same level of the execution tree, will have approx¬ 
imately the same execution time since the data is balanced. 

Our scheduling method is robust to use in practice since 
it neither assumes a particular operator behavior nor uses 
a model to predict execution times. The elastic layout al¬ 
location algorithm exclusively uses historical measurements 
taken after queries have been executed and so actual run¬ 
ning times of their operators are known. Further, ongoing 
queries are not affected by changes in the container layout 
as partitions located at the respective VMs are not deleted 
even if they are re-assigned elsewhere. This is possible be¬ 
cause of the de-coupled nature of the used compute and 
storage resources. Finally, our proposed algorithm is ideal 
when used for queries featuring UDFs unknown properties. 
UDFs are encountered frequently and their modeling and 
behavior prediction remains an open problem. 

6 . EXPERIMENTAL EVALUATION 

The objectives of our experimentation are to: A) evalu¬ 
ate our engine and show that we can achieve near-interactive 
response times for analytical queries, B) show that we can ef- 
hciently execute complex analytical queries with UDFs that 
have arbitrary user code, and O) examine the effectiveness 
of the proposed elastic container layout algorithm and as¬ 
certain its ability to adapt to the workload. 

6.1 Experimental Setup 

Experimental Environment: We have implemented the 
functionality presented within Exareme [^, our system 
for dataflow execution on the cloud. We compare our ap¬ 
proach with the latest version of Cloudera Impala, the state- 
of-the-art in-memory analytics platform [^. We deployed 
the systems in the Okeanos cloud and used up to 64 VMs 
for processing, each with 1 CPU, 4 GB of memory, and 20 
GB of disk. We measured the network bandwidth to be 
around 150 Mbps. We set the quantum Tq to 300 seconds 
and the cost of the quantum Mq to $0.41 (or equivalently 
~$5/hour). The memory of the operators in the execution 

^ okeanos.grnet.gr 



Figure 9: TPC-H table size distribution at 64 GB and 
128 GB scales. 


tree is set to 10% of the container’s memory, i.e., at most 
10 leaf, internal, or root queries can run concurently in each 
container. We also used a latest version of the HDFS dis¬ 
tributed file systenj^ as a storage service deployed in 8 VMs 
to store table partitions. 

Datasets: we used two datasets namely, TPC-H that 
typically models data warehouse settings, and Freebase, an 
RDF datasel|^ The TPC-H benchmark has eight tables: 
lineitem{12S, Forderkey), orders(128, o^orderkey), part{l), 
partsupp(l), supplier(l), customer{l), region(l), nation(l) 

In parentheses, we indicate the number of partitions we 
have created for each table and the key(s) based on which we 
performed table partitioning. We partition tables lineitem 
and orders on their foreign key using hash partitioning and 
replicate all other (smaller) tables. We used the 8 (~8GB), 
64 (~64GB), and 128 (~128 GB) as the TPC-H scale- 
factors. Figure shows the sizes of the benchmark tables 
illustrating the large size difference between the fact table 
lineitem and the rest of the tables. 

Freebase contains approximately 2.5 billion tuples in the 
form of RDF triples: <subjeet> <predieate> <objeet> 
and its volume stands at 250 GB. If the object is text, it 
is tagged at its end with the appropriate language symbol 
(e.g., @en means text in English). We load Freebase data 
into a 3-column table. 

Queries: we use a subset of the TPC-H queries that cover 
a wide range of the types of queries we target. In partic¬ 
ular, we choose queries 1, 3, 4, 5, 7, and 9. 1 uses only 
table lineitem and has 8 aggregate functions. Queries 3 and 
4 have a small number of joins (less than 3) and a small 
number of aggregate functions while queries 5, 7, and 9 fea¬ 
ture a large number of joins and several aggregate functions. 
With Freebase, we utilize two queries with complex UDFs 
to create a histogram of the languages that appear in the 
dataset. The first query uses regular expressions to separate 
the language of each object and then counts the number of 
languages encountered. The query is as follows: 

SELECT lang, count(lang) as c 
FROM (SELECT REGEXPR( ’.*(§(.*)’, o) as lang 
FROM freebase WHERE o like "7o(97o") 

GROUP BY lang ORDER BY c desc; 

The seeond query uses reservoir sampling to sample 1 mil¬ 
lion rows from the table and computes the histogram though 
a UDF that is applied on the sample and detects the lan¬ 
guage of a given text using a statistical model. The query 
is the following: 

^HDES version 2.6 hadoop. apache. org 
^ developers.google.com/freebase/data 













TPC-H on Exareme and Impala (64 GB, 64 VMs) 


TPC-H Benchmark 



Figure 11: TPC-H queries using tree and graph ex- 
Figure 10: TPC-H with 64 GB on Impala and ecution plans on 64 containers. 

Exareme using 64 VMs. 


SELECT lang, count(lang) as c 

FROM (SELECT DETECTLANG(sobj) as lang 

FROM (SELECT SAMPLE(1000000, obj) as sobj 
FROM freebase)) 

GROUP BY lang ORDER BY c desc; 

SLAs and Query Generator Client: We use two types 
of SLAs: “normal” with a = 10&7 = 80 and “high priority” 
with a = 20 & 7 = 40. We also created a generator that 
launches queries with a Poisson distribution. More specih- 
cally, the generator computes the arrival time k (in seconds) 
of the next query as f{k;X) = Pr(A = /c) = X^e~^/k\, 
where A is the expected value of X (in seconds). We can 
achieve desired query rates by setting A appropriately. For 
example, if A = 10, one query is issued to the engine every 
10 seconds on average. 

Algorithms and Measurements: We use our elastic VM 
layout allocation algorithm to adjust the size of the virtual 
infrastructure. As a baseline, we select a static layout that 
remains hxed over time. We use two such static allocations: 
small with (10, 4, 1) and large (42, 12, 3); here, we designate 
within parentheses the number of containers per layout level 
starting from the lower level Lq that contains the data. We 
bootstrap our dynamic layout allocation algorithm with a 
medium static conhguration (26, 8, 2). Finally while exper¬ 
imenting, we measure the following: average execution time 
for queries, revenue, cost, and average number of VMs used 
at each layout level. 

6.2 Near-Interactive Analytics 

In our hrst set of experiments, we validate the efficiency 
of the system by executing a single type of query at a time 
and measuring corresponding turnaround time. We run each 
query 4 times and report the average of the last 3 measure¬ 
ments, a technique also followed by others |^. In this way, 
the observed execution times reflects the behavior of the sys¬ 
tem in live operation. We use the TPC-H benchmark with 
64 VMs on Okeanos and a 3-level execution tree. Figure 
compares performance of our implementation, termed Exa- 
Tree, with that of Impala while using 64 GB of data on 
64 VMs. We observe that Exa-Tree is comparable, and in 
some cases more efficient, for the types of queries we focus 
on in this work. This is due to our data partitioning and 
placement scheme that reduces network traffic during query 
execution (due to replication) and the tree execution plans. 
As Impala runs entirely in memory, we were not able to run 
query 9 because we reached memory limits. 

We also compared with a previous version of Exareme 
that used graphs to execute queries. Figure ED shows the 


Freebase Language Histogram 



Figure 12: Execution times for Freebase queries. 


results. We observe that queries executed using tree exe¬ 
cution plans run signihcantly faster. The main reason is 
the tree execution in combination with the exploitation of 
data partitioning. The previous version of the system used a 
lattice (all-to-all connections) to partition the data and per¬ 
form aggregations in parallel. Using tree execution plans, 
we radically reduce the number of connections, improving 
the system performance by up to an order of magnitude and 
offer near-interactive response times (as small as 35 seconds 
on the 64 GB scale). 

6.3 Complex Analytics 

In the second set of experiments, we assess the efficiency 
of our engine on complex analytics expressed in UDEs, again 
by executing a single query at a time and measuring respec¬ 
tive execution times. As previously, we run each query 4 
times and report the average of the last 3 times. We use 
the Ereebase dataset and the two queries mentioned earlier 
in the section using 64 VMs. Figure [^depicts the attained 
execution times for the two queries {All and Sample). In 
the hrst query (Aii), operators at the leaves of the execution 
tree take most of the time as computing 2.4 billion regular 
expressions is expensive. The second query (Sample) be¬ 
ing highly selective completes in 339 seconds. It is worth 
mentioning that both queries produce similar distributions 
as shown in TableWe also pre-processed the <ohject>- 
column by extracting the language tag and created an addi¬ 
tional column on the table hosting the Ereebase. Here, the 
histogram on the entire dataset is computed in merely 107 
seconds without indexes and in 27 seconds using indexes. 
This performance highlights the near-real-time capabilities 
of our engine in large datasets. 

6.4 Elasticity under Dynamic Workloads 

In this set of experiments, we examine both the effect 
that the elasticity has on query execution time and the proht 
generated. For these experiments we used TPG-H with scale 
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Figure 13: Query exec, time (left), revenue cost (middle), and containers allocated per level (right). 
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Table 2: Freebase Language Histogram 


Comparison wth Static Infrastructure 



static Large 


factor 8 in order to be able to run the queries with a variety 
of infrastructure sizes. The clients connected to the system 
issue the queries 1 and 3 of the benchmark. 

6.4.1 Layout Stabilization 

Here, we examine the stabilization of the virtual infras¬ 
tructure. We use a workload with Q1 using the “normal” 
SLA and Poisson parameter A = 60. The left part of Fig¬ 
ure shows the average execution time of the queries over 
time. We observe that our algorithm is able to stabilize 
quickly after 800 seconds. The delay to reach steady state 
in the beginning is due to the initial data transfer from the 
cloud storage to the containers. This explains the high av¬ 
erage running time of the queries during that period. The 
middle part of Figure shows the revenue and the corre¬ 
sponding cost of the allocated virtual infrastructure. In the 
beginning, the revenue is actually lower than the cost, and 
thus, there is a loss instead of proht. After the data transfer 
has hnished, the profit is stabilized at a signihcantly high 
value. Finally, the right part of Figure shows the num¬ 
ber of containers at each level of the layout over time. We 
observe that the virtual infrastructure adapts to the work¬ 
load taking the shape of a tree, with most of the allocated 
containers located at the data level. 

6.4.2 Compare with Static Infrastructures 

Figure depicts the proht gained when the static VM 
conhgurations are used to handle the workload as well as 
the proht generated by our approach. We run the system 
for one hour using a client that issues query Q1 in three 
phases, each of 20 minute duration. In the hrst and third 
phase, the Poisson parameter A is set to 60 and in the second 
phase to 30 (the rate is doubled). 

We readily ascertain that smaller-sized infrastructures 
produce less revenue as expected. Similarly, the expended 
costs increase as more VMs and time quanta are used. The 
elastic layout allocator however produces a better-htted lay¬ 
out that adapts to the workload changes and yields the high¬ 
est proht compared to all static choices. Lastly, the elastic 


Figure 14: Elastic configuration vs. static layouts. 


approach does generate less revenue than the large infras¬ 
tructure. However, this is in sequence with our design as we 
optimize for proht and not for revenue. 

6.4.3 Measure adaptivity with Dynamic Workload 

In our hnal set of experiments, we evaluate the adaptabil¬ 
ity of our elastic online algorithm in presence of workloads 
whose features change over time. In particular, we employ 
a workload consisting of three stages, of 1 hour each, where 
query workload characteristics are perturbed between the 
stages. As a default workload we issue Q1 with a Poisson 
parameter A = 60 and using the “normal” SLA . We change 
this default query workload in the second stage using the 
following three options: 

• Varying Query Rates: we vary the rate with which 
queries are issued by setting the Poisson parameter to A = 30 
in the second stage and essentially, doubling the rate. The 
left part of Figure shows the VMs allocated per layout 
level as well as revenue. Our approach does rapidly adapt to 
varying workload and starts adjusting the number of VMs 
exactly at the phase boundaries. We also observe the num¬ 
ber of containers allocated is increased along with the query 
rate as more revenue is generated. 

• Varying SLAs: we vary the SLA type to “high priority” 
during stage 2, while phases 1 and 3 have queries with the 
“normal” SLA. The middle part of Figureshows our exe¬ 
cution results: for queries with a higher price, our algorithm 
designates more VMs to generate additional revenue. 

• Varying Query: in our hnal experiment, we vary the 
type of the queries issued. In stages 1 and 3, we use Q1 and 
in stage 2 we use (53. The right part of Figure shows 
once again the superiority of the elastic algorithm when it 
comes to the rapid adaption of the virtual infrastructure. 
We observe that for (53 the proht drops because it is more 
expensive to execute. The algorithm allocates more contain¬ 
ers in order to be able to keep the proht positive. 
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Figure 15: Elastic containers allocated per tree level and revenue and cost for workload with different phases. 


7. RELATED WORK 

There are several areas of data management where re¬ 
lated work has been conducted. We briefly outline here key 
results from the fields of data warehouses, iVoSQL-systems, 
and elasticity. 

7.1 Data Warehouses 

Data Warehouses store very large volumes of data and 
are typically used for report generation and historical anal¬ 
yses to discover trends. Several systems have been imple¬ 
mented that a re o pen-source (e.g., Hive [^), prop rietary 
(e.g., Tenzing [^), or commercial {e.g.,Vertiea [^). The 
most popular open-source warehouses are based on MapRe¬ 
duce and typically offer high level languages (e.g., 

SQL) to express queries. The latter are ultimately trans¬ 
formed to one or more MapReduce jobs [^. The MapRe¬ 
duce abstraction however is not efficient for heavy aggregate 
queries that we target in this work. In MapReduce, multi¬ 
level aggregations can only be expressed using multiple jobs, 
rendering the approach less efficient than that of a tree ab¬ 
straction. Moreover, the optimization goal of these systems 
is to both minimize the number of jobs they produce as well 
as to maximize parallelization in order to minimize their to¬ 
tal execution time. The monetary cost of the resources is 
by and large ignored. The same holds for Dremel and 
Scuba which has been recently proposed as specialized 
systems targeting query-tree executions, and, furthermore, 
to the best of our knowledge, are not elastic. 

7.2 NoSQL-Systems 

Several systems have been proposed to manage data in 
formats different than relational tables. Examples include 
MongoDB [^, Sawzall [^, PigLatin [^, and Flume- 
Java . All of the above are built either on top of MapRe¬ 
duce and so, they inherit all pertinent weaknesses mentioned 
earlier, or built from scratch by following approaches that 
are not suitable for the queries we target here [^. Fur¬ 
thermore, no such system offers a clean and simple way to 
dehne new UDFs and their properties so that they may be 
used during optimization. 


7.3 Elasticity 

Several works focus on cloud elasticity [M 37 and 
dynamically allocating resources to increase performance. A 
recent work focuses on how to minimize the number of 
VMs used to save on cost, but this is not a plausible strategy 
in our setting where queries are associated with SLAs and 
the goal is to maximize proht. Some works examine cloud 
elasti city in the context of in-memory distributed transac¬ 
tions [^. In our setting, the data are updated using bulk 
loading every day or week. 

Elasticity for array databases is examined recently pA] . 
This work, similarly to our methodology, makes predictions 
about the future based on past queries. However, the pro¬ 
posed algorithm is only applicable to array-based scientific 
data (that only grow in size and rarely deleted) and consid¬ 
ers only increasing the size of the virtual infrastructure. We 
focus on a more generic problem. 

To the best of our knowledge, none of the proposed solu¬ 
tions is suitable for our setting. Our proposal exploits cloud 
elasticity by automatically adjusting the size of the allocated 
virtual infrastructure to maximize proht by taking into ac¬ 
count SLAs and the monetary cost for using cloud resources 
that has been in general ignored thus far. 


8. CONCLUSIONS 

We propose an elastic engine built on top of laaS clouds 
to execute queries with a tree execution plan encountered 
in a large set of analytical SC|L queries that involve heavy 
aggregations. We suggest to layout the allocated infrastruc¬ 
ture laaS nodes in a tree shape so that we can naturally map 
the execution plans of these queries. Our elastic VM allo¬ 
cation algorithm dynamically changes the container layout 
based on the query workload monitored over a sliding time 
window. Our objective is to maximize the proht generated 
taking into account the monetary cost of the resources as 
well as the revenue generated by the query workload. Fi¬ 
nally, we shown that our approach offers near-interactive 
response times and adapts quickly to workload changes. 
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